SWOT Simulated North American Continent Hydrology Dataset Exploration in the Cloud

From the PO.DAAC Cookbook, to access the GitHub version of the notebook, follow this link.

Accessing and Visualizing SWOT Simulated Datasets

Requirement:

This tutorial can only be run in an AWS cloud instance running in us-west-2: NASA Earthdata Cloud data in S3 can be directly accessed via earthaccess python library; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region.

Learning Objectives:

  • Access all 5 products of SWOT HR sample data (archived in NASA Earthdata Cloud) within the AWS cloud, without downloading to local machine
  • Visualize accessed data

SWOT Simulated Level 2 North America Continent KaRIn High Rate Version 1 Datasets:

  1. River Vector Shapefile - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1

DOI: https://doi.org/10.5067/KARIN-2RSP1

  1. Lake Vector Shapefile - SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1

DOI: https://doi.org/10.5067/KARIN-2LSP1

  1. Water Mask Pixel Cloud NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1

DOI: https://doi.org/10.5067/KARIN-2PIX1

  1. Water Mask Pixel Cloud Vector Attribute NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXCVEC_V1

DOI: https://doi.org/10.5067/KARIN-2PXV1

  1. Raster NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1

DOI: https://doi.org/10.5067/KARIN-2RAS1

Notebook Author: Cassie Nickles, NASA PO.DAAC (Aug 2022)

Libraries Needed

import glob
import os
import requests
import s3fs
import netCDF4 as nc
import h5netcdf
import xarray as xr
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import hvplot.xarray
import shapefile as shp
import zipfile
import earthaccess
from earthaccess import Auth, DataCollections, DataGranules, Store

Earthdata Login

An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up. We use earthaccess to authenticate your login credentials below.

auth = earthaccess.login(strategy="interactive", persist=True)

Set up an s3fs session for Direct Access

s3fs sessions are used for authenticated access to s3 bucket and allows for typical file-system style operations. Below we create session by passing in the data access information.

fs_s3 = earthaccess.get_s3fs_session(daac='PODAAC', provider='POCLOUD')

Single File Access

The s3 access link can be found using earthaccess data search. Since this collection consists of Reach and Node files, we need to extract only the granule for the Reach file. We do this by filtering for the ‘Reach’ title in the data link.

Alternatively, Earthdata Search (see tutorial) can be used to manually search for a single file.

1. River Vector Shapefiles

#retrieves granule from the day we want
river_results = earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1', temporal = ('2022-08-22 19:24:41', '2022-08-22 19:30:37'))
Granules found: 2
#finds the s3 link of the one granule we want (The collection contains both Reaches and Nodes, but here we want only the Reach)
river_data = []
for g in river_results:
    for l in earthaccess.results.DataGranule.data_links(g, access='direct'):
        if "Reach" in l:
            river_data.append(l)
print(river_data)
['s3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1/SWOT_L2_HR_RiverSP_Reach_007_522_NA_20220822T192441_20220822T193037_PGA0_01.zip']
#creates s3 file object for the one link we want
s3_file_obj1 = fs_s3.open(river_data[0], mode='rb')

The native format for this sample data is a .zip file, and we want the .shp file within the .zip file, so we need to download the contents of the zip file into the cloud environment. I created a folder called SWOT_HR_shp to write to. Change the path to where you would like your extracted files to be written.

with zipfile.ZipFile(s3_file_obj1, 'r') as zip_ref:
    zip_ref.extractall('SWOT_HR_shp')

Next, we’ll look at the attribute table of the .shp file we just extracted to the ‘SWOT_HR_shp’ folder.

SWOT_HR_shp1 = gpd.read_file('SWOT_HR_shp/SWOT_L2_HR_RiverSP_Reach_007_522_NA_20220822T192441_20220822T193037_PGA0_01.shp') 
SWOT_HR_shp1
reach_id time time_tai time_str p_lat p_lon river_name wse wse_u wse_r_u ... p_width p_wid_var p_n_nodes p_dist_out p_length p_maf p_dam_id p_n_ch_max p_n_ch_mod geometry
0 71224300241 7.145115e+08 7.145114e+08 2022-08-22T19:2441Z 49.364818 -94.879318 no_data 3.472248e+01 -1.000000e+12 1.511000e-02 ... 7294.5 3.265803e+06 15 390935.258 3008.959150 -1.000000e+12 0 15 8 LINESTRING (-94.86483 49.37485, -94.86515 49.3...
1 71224300253 7.145115e+08 7.145115e+08 2022-08-22T19:2446Z 49.049486 -94.899554 no_data 3.439994e+01 -1.000000e+12 7.600000e-03 ... 394.5 7.876447e+06 42 444613.943 8411.845753 -1.000000e+12 0 8 1 LINESTRING (-94.92557 49.08401, -94.92556 49.0...
2 71224300263 7.145115e+08 7.145115e+08 2022-08-22T19:2448Z 48.977915 -94.869598 no_data 3.434701e+01 -1.000000e+12 9.620000e-03 ... 6365.5 2.935181e+06 42 453020.631 8406.687501 -1.000000e+12 0 3 1 LINESTRING (-94.88015 49.01512, -94.88006 49.0...
3 71224300273 7.145115e+08 7.145115e+08 2022-08-22T19:2449Z 48.902998 -94.854720 no_data 3.416786e+01 -1.000000e+12 1.372000e-02 ... 4650.0 3.770782e+06 43 461636.940 8616.309267 -1.000000e+12 0 1 1 LINESTRING (-94.86229 48.94092, -94.86228 48.9...
4 71224300283 7.145115e+08 7.145115e+08 2022-08-22T19:2450Z 48.883377 -94.783621 no_data 3.426341e+01 -1.000000e+12 5.050000e-03 ... 10439.0 2.952077e+07 46 470821.047 9184.106587 -1.000000e+12 0 5 1 LINESTRING (-94.73002 48.90430, -94.72952 48.9...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
666 74291700141 7.145116e+08 7.145116e+08 2022-08-22T19:2729Z 39.811344 -92.684233 no_data 3.869180e+01 -1.000000e+12 3.840000e-01 ... 42.0 1.060619e+02 48 2463828.961 9553.659853 -1.000000e+12 0 1 1 LINESTRING (-92.68784 39.77391, -92.68804 39.7...
667 74291700151 7.145116e+08 7.145116e+08 2022-08-22T19:2727Z 39.888856 -92.683047 no_data 3.687848e+01 -1.000000e+12 1.897500e-01 ... 42.0 1.208373e+02 48 2473386.489 9557.528221 -1.000000e+12 0 1 1 LINESTRING (-92.68497 39.84931, -92.68497 39.8...
668 74291700161 -1.000000e+12 -1.000000e+12 no_data 39.962507 -92.671510 no_data -1.000000e+12 -1.000000e+12 -1.000000e+12 ... 42.0 9.634731e+01 48 2482916.982 9530.492810 -1.000000e+12 0 1 1 LINESTRING (-92.66461 39.92629, -92.66425 39.9...
669 74291700171 7.145116e+08 7.145116e+08 2022-08-22T19:2725Z 40.045903 -92.690485 no_data 3.647532e+01 -1.000000e+12 1.394000e-01 ... 54.0 6.557368e+02 59 2494780.714 11863.732260 -1.000000e+12 0 2 1 LINESTRING (-92.68041 40.00013, -92.68006 40.0...
670 74291700181 7.145116e+08 7.145116e+08 2022-08-22T19:2723Z 40.133798 -92.687305 no_data 3.384407e+01 -1.000000e+12 1.019290e+00 ... 42.0 1.250056e+02 58 2506424.880 11644.165636 -1.000000e+12 0 1 1 LINESTRING (-92.68547 40.09150, -92.68519 40.0...

671 rows × 111 columns

fig, ax = plt.subplots(figsize=(11,7))
SWOT_HR_shp1.plot(ax=ax, color='black')
<Axes: >

2. Lake Vector Shapefiles

The lake vector shapefiles can be accessed in the same way as the river shapefiles above.

lake_results = earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1', temporal = ('2022-08-22 19:24:18', '2022-08-22 19:30:50'))
Granules found: 3
#find the s3 link of the desired granule (This collection has three options: Obs, Unassigned, and Prior - we want Obs)
lake_data = []
for g in lake_results:
    for l in earthaccess.results.DataGranule.data_links(g, access='direct'):
        if "Obs" in l:
            lake_data.append(l)
print(lake_data)
['s3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1/SWOT_L2_HR_LakeSP_Obs_007_522_NA_20220822T192415_20220822T193051_Dx0000_01.zip']
s3_file_obj2 = fs_s3.open(lake_data[0], mode='rb')
with zipfile.ZipFile(s3_file_obj2, 'r') as zip_ref:
    zip_ref.extractall('SWOT_HR_shp')
SWOT_HR_shp2 = gpd.read_file('SWOT_HR_shp/SWOT_L2_HR_LakeSP_Obs_007_522_NA_20220822T192415_20220822T193051_Dx0000_01.shp') 
SWOT_HR_shp2
obs_id lake_id overlap time time_tai time_str wse wse_u wse_r_u wse_std ... iono_c xovr_cal_c p_name p_grand_id p_max_wse p_max_area p_ref_date p_ref_ds p_storage geometry
0 742081R000002 7420470702 93 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.934 0.051 0.051 0.159 ... 0.0 0.0 no_data -99999999 -1.000000e+12 1.35 -9999 -9999.0 -1.000000e+12 POLYGON ((-92.75926 42.04142, -92.75977 42.041...
1 742081R000003 7420472462 75 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 37.037 0.080 0.080 0.143 ... 0.0 0.0 no_data -99999999 -1.000000e+12 1.62 -9999 -9999.0 -1.000000e+12 POLYGON ((-92.91651 42.01167, -92.91681 42.011...
2 742081R000008 7420473212 58 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.578 0.181 0.181 0.058 ... 0.0 0.0 HENDRICKSON MARSH LAKE -99999999 -1.000000e+12 45.94 -9999 -9999.0 -1.000000e+12 POLYGON ((-93.24060 41.93319, -93.24066 41.933...
3 742081R000009 7420470712 73 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.910 0.110 0.110 0.136 ... 0.0 0.0 no_data -99999999 -1.000000e+12 4.50 -9999 -9999.0 -1.000000e+12 POLYGON ((-92.72557 42.03424, -92.72560 42.034...
4 742081R000011 7420470582 76 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.904 0.109 0.109 0.628 ... 0.0 0.0 no_data -99999999 -1.000000e+12 1.89 -9999 -9999.0 -1.000000e+12 POLYGON ((-93.39929 41.90871, -93.39945 41.908...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17240 742070R000729 7420857422 92 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 33.186 0.098 0.098 0.056 ... 0.0 0.0 no_data -99999999 -1.000000e+12 14.94 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.07581 47.57853, -95.07587 47.578...
17241 742070R000730 7420848152 96 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 33.071 0.038 0.038 0.174 ... 0.0 0.0 no_data -99999999 -1.000000e+12 3.06 -9999 -9999.0 -1.000000e+12 POLYGON ((-94.88532 47.61669, -94.88577 47.616...
17242 712070R000731 7120812272 74 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 33.104 0.077 0.077 0.085 ... 0.0 0.0 no_data -99999999 -1.000000e+12 4.41 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.08313 47.57179, -95.08341 47.571...
17243 712070R000732 7120816202 67 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 32.713 0.106 0.106 0.198 ... 0.0 0.0 MUD LAKE -99999999 -1.000000e+12 6.30 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.39968 47.51004, -95.39975 47.510...
17244 742070R000733 7420857422 95 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 32.725 0.093 0.093 0.119 ... 0.0 0.0 no_data -99999999 -1.000000e+12 14.94 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.07228 47.57473, -95.07257 47.574...

17245 rows × 43 columns

fig, ax = plt.subplots(figsize=(7,12))
SWOT_HR_shp2.plot(ax=ax, color='black')
<Axes: >

3. Water Mask Pixel Cloud NetCDF

Accessing the remaining files is different than the shp files above. We do not need to unzip the files because they are stored in native netCDF files in the cloud. For the rest of the products, we will open via xarray.

watermask_results = earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1', temporal = ('2022-08-22 19:29:00', '2022-08-22 19:29:11'), point = ('-90', '35'))
Granules found: 1

The pixel cloud netCDF files are formatted with three groups titled, “pixel cloud”, “tvp”, or “noise” (more detail here). In order to access the coordinates and variables within the file, a group must be specified when calling xarray open_dataset.

ds_PIXC = xr.open_mfdataset(earthaccess.open([watermask_results[0]]), group = 'pixel_cloud', engine='h5netcdf')
ds_PIXC
 Opening 1 granules, approx size: 0.0 GB
<xarray.Dataset>
Dimensions:                                (points: 489673, complex_depth: 2)
Coordinates:
    latitude                               (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray>
    longitude                              (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray>
Dimensions without coordinates: points, complex_depth
Data variables: (12/49)
    azimuth_index                          (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray>
    range_index                            (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray>
    interferogram                          (points, complex_depth) float32 dask.array<chunksize=(489673, 2), meta=np.ndarray>
    power_plus_y                           (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    power_minus_y                          (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    coherent_power                         (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    ...                                     ...
    solid_earth_tide                       (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    load_tide_fes                          (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    load_tide_got                          (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    pole_tide                              (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    ancillary_surface_classification_flag  (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    pixc_qual                              (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
Attributes:
    description:                 cloud of geolocated interferogram pixels
    interferogram_size_azimuth:  2923
    interferogram_size_range:    4575
    looks_to_efflooks:           1.75
plt.scatter(x=ds_PIXC.longitude, y=ds_PIXC.latitude, c=ds_PIXC.height)
plt.colorbar().set_label('Height (m)')

4. Water Mask Pixel Cloud Vector Attribute NetCDF

vector_results = earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXCVEC_V1', temporal = ('2022-08-22 19:29:00', '2022-08-22 19:29:11'), point = ('-90', '35'))
Granules found: 1
ds_PIXCVEC = xr.open_mfdataset(earthaccess.open([vector_results[0]]), decode_cf=False,  engine='h5netcdf')
ds_PIXCVEC
 Opening 1 granules, approx size: 0.0 GB
<xarray.Dataset>
Dimensions:               (points: 489673, nchar_reach_id: 11,
                           nchar_node_id: 14, nchar_lake_id: 10,
                           nchar_obs_id: 13)
Dimensions without coordinates: points, nchar_reach_id, nchar_node_id,
                                nchar_lake_id, nchar_obs_id
Data variables:
    azimuth_index         (points) int32 dask.array<chunksize=(489673,), meta=np.ndarray>
    range_index           (points) int32 dask.array<chunksize=(489673,), meta=np.ndarray>
    latitude_vectorproc   (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray>
    longitude_vectorproc  (points) float64 dask.array<chunksize=(489673,), meta=np.ndarray>
    height_vectorproc     (points) float32 dask.array<chunksize=(489673,), meta=np.ndarray>
    reach_id              (points, nchar_reach_id) |S1 dask.array<chunksize=(489673, 11), meta=np.ndarray>
    node_id               (points, nchar_node_id) |S1 dask.array<chunksize=(489673, 14), meta=np.ndarray>
    lake_id               (points, nchar_lake_id) |S1 dask.array<chunksize=(489673, 10), meta=np.ndarray>
    obs_id                (points, nchar_obs_id) |S1 dask.array<chunksize=(489673, 13), meta=np.ndarray>
    ice_clim_f            (points) int8 dask.array<chunksize=(489673,), meta=np.ndarray>
    ice_dyn_f             (points) int8 dask.array<chunksize=(489673,), meta=np.ndarray>
Attributes: (12/36)
    Conventions:                                 CF-1.7
    title:                                       Level 2 KaRIn high rate pixe...
    institution:                                 CNES
    source:                                      Simulation
    history:                                     2021-04-14 18:11:49Z: Creation
    platform:                                    SWOT
    ...                                          ...
    xref_input_l2_hr_pixc_vec_river_file:        /work/ALT/swot/swotdev/desro...
    xref_static_river_db_file:                   
    xref_static_lake_db_file:                    /work/ALT/swot/swotpub/BD/BD...
    xref_l2_hr_lake_tile_config_parameter_file:  /work/ALT/swot/swotdev/desro...
    ellipsoid_semi_major_axis:                   6371008.771416667
    ellipsoid_flattening:                        0.0
pixcvec_htvals = ds_PIXCVEC.height_vectorproc.compute()
pixcvec_latvals = ds_PIXCVEC.latitude_vectorproc.compute()
pixcvec_lonvals = ds_PIXCVEC.longitude_vectorproc.compute()

#Before plotting, we set all fill values to nan so that the graph shows up better spatially
pixcvec_htvals[pixcvec_htvals > 15000] = np.nan
pixcvec_latvals[pixcvec_latvals > 80] = np.nan
pixcvec_lonvals[pixcvec_lonvals > 180] = np.nan
plt.scatter(x=pixcvec_lonvals, y=pixcvec_latvals, c=pixcvec_htvals)
plt.colorbar().set_label('Height (m)')

5. Raster NetCDF

raster_results = earthaccess.search_data(short_name = 'SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1', temporal = ('2022-08-22 19:28:50', '2022-08-22 19:29:11'), point = ('-90', '35'))
Granules found: 2
#this collection has 100m and 250m granules, but we only want 100m
raster_data = []
for g in raster_results:
    for l in earthaccess.results.DataGranule.data_links(g, access='direct'):
        if "100m" in l:
            raster_data.append(l)
print(raster_data)
['s3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1/SWOT_L2_HR_Raster_100m_UTM15S_N_x_x_x_007_522_047F_20220822T192850_20220822T192911_Dx0000_01.nc']
ds_raster = xr.open_mfdataset(earthaccess.open([raster_data[0]], provider = 'POCLOUD'), engine='h5netcdf')
ds_raster
<xarray.Dataset>
Dimensions:                (x: 1543, y: 1540)
Coordinates:
  * x                      (x) float64 6.567e+05 6.568e+05 ... 8.109e+05
  * y                      (y) float64 3.775e+06 3.775e+06 ... 3.929e+06
Data variables: (12/30)
    crs                    object ...
    longitude              (y, x) float64 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    latitude               (y, x) float64 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    wse                    (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    wse_uncert             (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    water_area             (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    ...                     ...
    load_tide_fes          (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    load_tide_got          (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    pole_tide              (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    model_dry_tropo_cor    (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    model_wet_tropo_cor    (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
    iono_cor_gim_ka        (y, x) float32 dask.array<chunksize=(1540, 1543), meta=np.ndarray>
Attributes: (12/45)
    Conventions:                     CF-1.7
    title:                           Level 2 KaRIn High Rate Raster Data Product
    institution:                     JPL
    source:                          Large scale simulator
    history:                         2021-09-08T22:28:33Z : Creation
    mission_name:                    SWOT
    ...                              ...
    utm_zone_num:                    15
    mgrs_latitude_band:              S
    x_min:                           656700.0
    x_max:                           810900.0
    y_min:                           3775000.0
    y_max:                           3928900.0

It’s easy to analyze and plot the data with packages such as hvplot!

ds_raster.wse.hvplot.image(y='y', x='x')